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ABSTRACT 

This paper is the sixth in a series designed to 
supplement the statistics training of students. The intended audience 
is social science undergraduate and graduate students studying 
applied statistics. The purpose of the applied statistics monographs 
is to provide selected proofs and derivations of important 
relationships or formulas that students do not find available and/or 
comprehensible in journals, textbooks and similar sources. Derived is 
the theoretical limits of the sample multivariate (or multiple) 
correlation of one criterion (dependent variable) and any (finite) 
number of predictors (independent variables) . The proof given in this 
paper involves deriving the individual terms of R. The lower limit 
and upper limit of R are derived separately. (KR) 
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A Derivation of the limits of the Sample Multivariate Correlation 

Francis J. O'Brien, Jr., Ph.D. 
Introduction 

This paper is the sixth in a series of ERIC publications designed 
to supplement the statistics training of students. For related 
documents see O'Brien (1982a; 1962b; 1982c; 1984; 1987). The 
intended audience for these papers is social science undergraduate 
and graduate students studying applied statistics. 

The purpose of these applied statistics monographs is to provide 
selected proofs and derivations of important relationships or formulas 
that students do not find available and /or comprehensible in Journals, 
textbooks and similar sources. For example, based on the author's 
personal experience as a former applied statistics instructor at the 
graduate level, few students would profit from a reading jf Kendall and 
Stuart (1967) to understand the proof provided in the present paper. 
The unique feature of the papers in this series is detaa.u step-by-step 
proofs or derivations written in a consistent notation system. Calculus 
is neither used nor assumed. Each proof or derivation is presented 
algebraically in detail. 

The present paper assumes familiarity with the authors' 1982c 
paper (or equivalent knowledge). That paper formulated a detailed 
derivation of the sample multiple correlation formula for one 
dependent variable and p predictors for the linear model based on 
standardized (z) variables. 



Introduction to Proof 



In this paper we derive the theoretical limits of the sample 
multivariate (or multiple) correlation of one criterion (dependent 
variable) and any (finite) number of predictors (independent variables). 
To facilitate the development of the proof, we will work with 
standardized (z) variables. Although the proof could be presented in 
the unstandardlzed ("raw score") form, normalized variables reduces 
some of the algebraic details. 

Many students have learned that the multivariate correlation 
between one dependent variable and a finite number of independent 
variables can be expressed as a weighted sum of regression weights 
and Pearson (zero-order) product-moment correlations between 
dependent/independent variables. This relationship holds only for 
standardized variables. This correlation for p independent variables 
can be written (see O'Brien, 1982c): 




,r , +o_r _ + ... +BT. + ... +B r 
lyl 2y2 j yj P yp 



Writing the right-hand side in summation notation. 




where 



= multiple correlation of p standardized 



variables. 



= the standardized dependent variable 



Zj, Z2. .... Zj, .... Zp 



= the standardized independent 
variables 



= beta (regression weights) 

attached to each 
standardized independent 

variable* 

* product moment (zero- 
order) dependent/independent 
variable correlations. 

Many students know that the numerical limits on the above 
multiple R are zero and 1 (i.e., 0 £ R £ 1 ). The purpose of this 
paper is to prove that statement. 

Proof that 0 £R£ 1 

In this section we present a detailed proof that the limits of the 
multiple R are 0/1. First, a review is given of the notation and 
necessary definitions as well as the relevant results that were derived 
in O'Brien (1982c). 

We can state the formal linear regression predicUon equation for 
p standardized predictors as follows:** 




This equation represents the predicted standardized criterion 
measure or score (Za ) for the tth subject in the sample on the p 

i 

standardized variables Z x through Z p . 

* Technically, the beta weights (Bj ) are called "standardized partial regression 

cofllcients". The formal notation in some standard textbooks is more elaborate than 
ours (e.g., Hays, 1973 or Kendall and Stuart. 1967). As In previous papers, we have 
minimized the reading of the symbolism to clarify the concepts in the development of 

the proof. 

** The coefficient "A" is not included for the reason given in O'Brien (1982c) ; i.e.. it 
"drops out" in the least squares derivations and so may be ignored. 



Bj. Bg Bj Bp 



■yl' y2 yj* 



yp 



The multiple correlation (or Just R for short) for this regression 
model of p standardized predictors may be defined conceptually as: 

CoKZy, Za) 
R « CorrCZy, Za) = Y 



^VartZyjVa/tZA) 



where Corr is the correlation operator, Cov is the covariance operator, 
and Vdr is the variance operator. Note that Zy is the random variable 

that represents the "observed" or known information while Za 
represents the "predicted information". 

The proof that is given in this paper involves deriving the 
individual terms of R. Two tables are provided for reference in the 
development of the proof. Table 1 summarizes familiar formulas for 
standardized variables. Table 2 is a summary of the results derived in 
O'Brien (1982c) for the multiple R of p standardized variables. The 
information in each table provides the essential building blocks of the 
proof. 
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Table 1 

Formulas and Relationships for Sample Standardized Variables 



Name of Quantity 
Sum 



n 



mla. 



1=1 



Note 



n is sample size. Hie 

summation is understood to 
be across the sample for a 
given predictor J. 



n 



Sum of Squares = n-1 Above note applies. 



1=1 



n 



Mean 




Mean of Jih predictor for 



total sample. The 
summation is understood 
to be across the sample 
for a given predictor J. 

n 

£ 1 

Variance — ■ — = VcaiZ.) = l Variance of Jth 

n-i J 

predictor for total 
sample. The 
summation is 
understood to be 
across the sample 
for a given 
predictoi J. 



8 

9 



(Table 1 cont.) 



n 



YW. 



Correlation — ^ — * T z& m r General zero-order 



correlation formula for 
any two standardized 
variables, Zg and Zy . 



Note: Proof of these formulas/relationships may be found in O'Brien 
(1982b, Appendix). 
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Table 2 

Formulas and Relationships for the Sample Multiple R 



p p m p p-i 

»2 



J J 



Var(Z y ) = i 



p . p p-1 



R = CorrlZy.Zj) - 



Vl^ 2+ 2 iS B,Bjr « 



where ryj = dependent/independent variable 

Pearson (zero-order) correlations, and 
ry = Pearson correlations among the p 

independent variables 

Note: Proof of these formulas/relationships may be found in O'Brien 
(1982c). 



ERJ.C 



10 

11 



As the reader can verily from Table 2, the covariance term is 
CcMZy, Z$) - Vari Z^) a R 2 . These relationships constitute the "key" 

to the proof for the 0/1 limits of R as developed in this paper. We now 
demonstrate this proof. The development of the proof will consist of 
two parts -one part will demonstrate the proof for the lower limit and 
the other will show the proof for the upper limit. The lower limit is 
now presented. 



Proof of the lower limit 

The proof of the lower limit (R £ 0) is based on an algebraic 

inequality and the information in Table 1 and Table 2. Recall the 
conceptual definition of the sample variance of Z* : 



As is true for any standardized mean, = 0 (see Table 1). Thus, 

n 



5V 



Var(Z£)= '" 1 



n-l 



The reader will agree that the following algebraic inequality is a true 
statement mathematically: 

From Table 2, this statement is equivalent to: 



1 1 
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But, as the reader can verify from liable 2, Vartfty ■ R 2 , 
Hence, 

4^ = Var( Za) = R 2 or 
VaK Z^U 0 

Since the value of the square root of a variance term is, by definition, 
positive, then 

^Var(ZA) * 0 
or by substituting R 2 , 

Vr 2 £ 0 

Consequently, 
R£ 0. 

The proof for the lower limit has been demonstrated. 
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Proof of the upper limit 

The proof of the upper limit (R £ 1) follows with similar logic. Hie 
reader will recall that the least squares criterion for standard scores can be 
stated as follows (see O'Brien, 1982c): 

n 9 

^/(Z^ - ) = a minimum. 



which is a true statement mathematically. 

Our proof for the upper limit will consist of first expanding the above 
squared sum, substituting quantities from Tables 1 and 2. and simplifying. 
We then return to the inequality relation and conclude the derivation. 

Expanding out the left side as a binomial and bringing in the 
summation operator: 



Each term can be simplified in turn. As shown in Table 1, the sum of 
squared standardized scores in a sample is: 




We can also write the least squares criterion as: 





n 




Z y = n- 1 where n is the sample size. As for the second 



term in the expansion, that term reduces to 
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n 2 



* (n-l)Var(ZA) 



i 



which is derived as an algebraic manipulation for the form given in Table 1. 
The last term can be obtained in several steps by expansion and 
manipulation as follows: 

2 X V*. " ^V 8 ! 2 ! +B 2 Z 2 + +B P Z P) 
1=1 1=1 



n 

- 2^(B 1 Z 1 Z yj ♦ ♦ ♦ B p Z pZyj ) 

1=1 



-*t*if*i\+ht,Vf i + ... + 
i=i i=i i=i 



n 

From Table 1, it can be seen that any term of the form ^T^j 2 *^ iS ec * ual to 

i=i 

(n-l)r . For correlations involving the independent/ dependent variables 

(r ) , we have: 
XI 



2 2 V*. = 2 [ B i (n - 1)r yi + B 2 (n - 1)r y2 + .. + B P (n - 1)r yp]. 
1=1 



= 2(n-l)VBr . 



Collecting all terms together, we can now rewrite the least squares criterion 
as: 
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£(2^ - ) 2 - n-1 + (n-l)Var(ZA) - 2(^1)^1^ 2> 0 
i=i 1 J=i 



Upon factoring out n-1 and dividing it through the inequality, we have 



i=i j=i 

Now, irom Table 2, we know that r ■ Vari Za). 

j=i ^ ^ 

Thus, 



a 0 



n 



- Z$ T = 1 + Var{ Zy) - 2Var[ Zy) * 0 
i=i 1 



or 



n 



^(Zy^ - Z^T = 1 - VaKZy) ;> 0 



i=i 



Reversing the St. lse of the inequality, we can write the right hand side of 
the above as: 



VarfZy) £ 1. 

This gives equivalently, 

^Var(Z$ <; 1 
But since Var( Zy) - R 2 , then 

15 

ERlC ! 6 



Vr? * 1 or 

R£ 1. Proof Is completed. 
We have proven in this paper that 0 £ R £ 1. 
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